Goto

Collaborating Authors

 Frederick County


Benchmarking community drug response prediction models: datasets, models, tools, and metrics for cross-dataset generalization analysis

Partin, Alexander, Vasanthakumari, Priyanka, Narykov, Oleksandr, Wilke, Andreas, Koussa, Natasha, Jones, Sara E., Zhu, Yitan, Overbeek, Jamie C., Jain, Rajeev, Fernando, Gayara Demini, Sanchez-Villalobos, Cesar, Garcia-Cardona, Cristina, Mohd-Yusof, Jamaludin, Chia, Nicholas, Wozniak, Justin M., Ghosh, Souparno, Pal, Ranadip, Brettin, Thomas S., Weil, M. Ryan, Stevens, Rick L.

arXiv.org Artificial Intelligence

Deep learning (DL) and machine learning (ML) models have shown promise in drug response prediction (DRP), yet their ability to generalize across datasets remains an open question, raising concerns about their real-world applicability. Due to the lack of standardized benchmarking approaches, model evaluations and comparisons often rely on inconsistent datasets and evaluation criteria, making it difficult to assess true predictive capabilities. In this work, we introduce a benchmarking framework for evaluating cross-dataset prediction generalization in DRP models. Our framework incorporates five publicly available drug screening datasets, six standardized DRP models, and a scalable workflow for systematic evaluation. To assess model generalization, we introduce a set of evaluation metrics that quantify both absolute performance (e.g., predictive accuracy across datasets) and relative performance (e.g., performance drop compared to within-dataset results), enabling a more comprehensive assessment of model transferability. Our results reveal substantial performance drops when models are tested on unseen datasets, underscoring the importance of rigorous generalization assessments. While several models demonstrate relatively strong cross-dataset generalization, no single model consistently outperforms across all datasets. Furthermore, we identify CTRPv2 as the most effective source dataset for training, yielding higher generalization scores across target datasets. By sharing this standardized evaluation framework with the community, our study aims to establish a rigorous foundation for model comparison, and accelerate the development of robust DRP models for real-world applications.


WavePulse: Real-time Content Analytics of Radio Livestreams

Mittal, Govind, Gupta, Sarthak, Wagle, Shruti, Chopra, Chirag, DeMattee, Anthony J, Memon, Nasir, Ahamad, Mustaque, Hegde, Chinmay

arXiv.org Artificial Intelligence

Radio remains a pervasive medium for mass information dissemination, with AM/FM stations reaching more Americans than either smartphone-based social networking or live television. Increasingly, radio broadcasts are also streamed online and accessed over the Internet. We present WavePulse, a framework that records, documents, and analyzes radio content in real-time. While our framework is generally applicable, we showcase the efficacy of WavePulse in a collaborative project with a team of political scientists focusing on the 2024 Presidential Elections. We use WavePulse to monitor livestreams of 396 news radio stations over a period of three months, processing close to 500,000 hours of audio streams. These streams were converted into time-stamped, diarized transcripts and analyzed to track answer key political science questions at both the national and state levels. Our analysis revealed how local issues interacted with national trends, providing insights into information flow. Our results demonstrate WavePulse's efficacy in capturing and analyzing content from radio livestreams sourced from the Web. Code and dataset can be accessed at \url{https://wave-pulse.io}.


Pathway-Guided Optimization of Deep Generative Molecular Design Models for Cancer Therapy

Qayyum, Alif Bin Abdul, Mertins, Susan D., Paulson, Amanda K., Urban, Nathan M., Yoon, Byung-Jun

arXiv.org Artificial Intelligence

The data-driven drug design problem can be formulated as an optimization task of a potentially expensive black-box objective function over a huge high-dimensional and structured molecular space. The junction tree variational autoencoder (JTVAE) has been shown to be an efficient generative model that can be used for suggesting legitimate novel drug-like small molecules with improved properties. While the performance of the generative molecular design (GMD) scheme strongly depends on the initial training data, one can improve its sampling efficiency for suggesting better molecules with enhanced properties by optimizing the latent space. In this work, we propose how mechanistic models - such as pathway models described by differential equations - can be used for effective latent space optimization(LSO) of JTVAEs and other similar models for GMD. To demonstrate the potential of our proposed approach, we show how a pharmacodynamic model, assessing the therapeutic efficacy of a drug-like small molecule by predicting how it modulates a cancer pathway, can be incorporated for effective LSO of data-driven models for GMD.


Assessing Reusability of Deep Learning-Based Monotherapy Drug Response Prediction Models Trained with Omics Data

Overbeek, Jamie C., Partin, Alexander, Brettin, Thomas S., Chia, Nicholas, Narykov, Oleksandr, Vasanthakumari, Priyanka, Wilke, Andreas, Zhu, Yitan, Clyde, Austin, Jones, Sara, Gnanaolivu, Rohan, Liu, Yuanhang, Jiang, Jun, Wang, Chen, Knutson, Carter, McNaughton, Andrew, Kumar, Neeraj, Fernando, Gayara Demini, Ghosh, Souparno, Sanchez-Villalobos, Cesar, Zhang, Ruibo, Pal, Ranadip, Weil, M. Ryan, Stevens, Rick L.

arXiv.org Artificial Intelligence

Cancer drug response prediction (DRP) models present a promising approach towards precision oncology, tailoring treatments to individual patient profiles. While deep learning (DL) methods have shown great potential in this area, models that can be successfully translated into clinical practice and shed light on the molecular mechanisms underlying treatment response will likely emerge from collaborative research efforts. This highlights the need for reusable and adaptable models that can be improved and tested by the wider scientific community. In this study, we present a scoring system for assessing the reusability of prediction DRP models, and apply it to 17 peer-reviewed DL-based DRP models. As part of the IMPROVE (Innovative Methodologies and New Data for Predictive Oncology Model Evaluation) project, which aims to develop methods for systematic evaluation and comparison DL models across scientific domains, we analyzed these 17 DRP models focusing on three key categories: software environment, code modularity, and data availability and preprocessing. While not the primary focus, we also attempted to reproduce key performance metrics to verify model behavior and adaptability. Our assessment of 17 DRP models reveals both strengths and shortcomings in model reusability. To promote rigorous practices and open-source sharing, we offer recommendations for developing and sharing prediction models. Following these recommendations can address many of the issues identified in this study, improving model reusability without adding significant burdens on researchers. This work offers the first comprehensive assessment of reusability and reproducibility across diverse DRP models, providing insights into current model sharing practices and promoting standards within the DRP and broader AI-enabled scientific research community.


TMA-Grid: An open-source, zero-footprint web application for FAIR Tissue MicroArray De-arraying

Ge, Aaron, Saha, Monjoy, Duggan, Maire A., Lenz, Petra, Abubakar, Mustapha, García-Closas, Montserrat, Balasubramanian, Jeya, Almeida, Jonas S., Bhawsar, Praphulla MS

arXiv.org Artificial Intelligence

Background: Tissue Microarrays (TMAs) significantly increase analytical efficiency in histopathology and large-scale epidemiologic studies by allowing multiple tissue cores to be scanned on a single slide. The individual cores can be digitally extracted and then linked to metadata for analysis in a process known as de-arraying. However, TMAs often contain core misalignments and artifacts due to assembly errors, which can adversely affect the reliability of the extracted cores during the de-arraying process. Moreover, conventional approaches for TMA de-arraying rely on desktop solutions.Therefore, a robust yet flexible de-arraying method is crucial to account for these inaccuracies and ensure effective downstream analyses. Results: We developed TMA-Grid, an in-browser, zero-footprint, interactive web application for TMA de-arraying. This web application integrates a convolutional neural network for precise tissue segmentation and a grid estimation algorithm to match each identified core to its expected location. The application emphasizes interactivity, allowing users to easily adjust segmentation and gridding results. Operating entirely in the web-browser, TMA-Grid eliminates the need for downloads or installations and ensures data privacy. Adhering to FAIR principles (Findable, Accessible, Interoperable, and Reusable), the application and its components are designed for seamless integration into TMA research workflows. Conclusions: TMA-Grid provides a robust, user-friendly solution for TMA dearraying on the web. As an open, freely accessible platform, it lays the foundation for collaborative analyses of TMAs and similar histopathology imaging data. Availability: Web application: https://episphere.github.io/tma-grid Code: https://github.com/episphere/tma-grid Tutorial: https://youtu.be/miajqyw4BVk


Analysis of the BraTS 2023 Intracranial Meningioma Segmentation Challenge

LaBella, Dominic, Baid, Ujjwal, Khanna, Omaditya, McBurney-Lin, Shan, McLean, Ryan, Nedelec, Pierre, Rashid, Arif, Tahon, Nourel Hoda, Altes, Talissa, Bhalerao, Radhika, Dhemesh, Yaseen, Godfrey, Devon, Hilal, Fathi, Floyd, Scott, Janas, Anastasia, Kazerooni, Anahita Fathi, Kirkpatrick, John, Kent, Collin, Kofler, Florian, Leu, Kevin, Maleki, Nazanin, Menze, Bjoern, Pajot, Maxence, Reitman, Zachary J., Rudie, Jeffrey D., Saluja, Rachit, Velichko, Yury, Wang, Chunhao, Warman, Pranav, Adewole, Maruf, Albrecht, Jake, Anazodo, Udunna, Anwar, Syed Muhammad, Bergquist, Timothy, Chen, Sully Francis, Chung, Verena, Conte, Gian-Marco, Dako, Farouk, Eddy, James, Ezhov, Ivan, Khalili, Nastaran, Iglesias, Juan Eugenio, Jiang, Zhifan, Johanson, Elaine, Van Leemput, Koen, Li, Hongwei Bran, Linguraru, Marius George, Liu, Xinyang, Mahtabfar, Aria, Meier, Zeke, Moawad, Ahmed W., Mongan, John, Piraud, Marie, Shinohara, Russell Takeshi, Wiggins, Walter F., Abayazeed, Aly H., Akinola, Rachel, Jakab, András, Bilello, Michel, de Verdier, Maria Correia, Crivellaro, Priscila, Davatzikos, Christos, Farahani, Keyvan, Freymann, John, Hess, Christopher, Huang, Raymond, Lohmann, Philipp, Moassefi, Mana, Pease, Matthew W., Vollmuth, Phillipp, Sollmann, Nico, Diffley, David, Nandolia, Khanak K., Warren, Daniel I., Hussain, Ali, Fehringer, Pascal, Bronstein, Yulia, Deptula, Lisa, Stein, Evan G., Taherzadeh, Mahsa, de Oliveira, Eduardo Portela, Haughey, Aoife, Kontzialis, Marinos, Saba, Luca, Turner, Benjamin, Brüßeler, Melanie M. T., Ansari, Shehbaz, Gkampenis, Athanasios, Weiss, David Maximilian, Mansour, Aya, Shawali, Islam H., Yordanov, Nikolay, Stein, Joel M., Hourani, Roula, Moshebah, Mohammed Yahya, Abouelatta, Ahmed Magdy, Rizvi, Tanvir, Willms, Klara, Martin, Dann C., Okar, Abdullah, D'Anna, Gennaro, Taha, Ahmed, Sharifi, Yasaman, Faghani, Shahriar, Kite, Dominic, Pinho, Marco, Haider, Muhammad Ammar, Aristizabal, Alejandro, Karargyris, Alexandros, Kassem, Hasan, Pati, Sarthak, Sheller, Micah, Alonso-Basanta, Michelle, Villanueva-Meyer, Javier, Rauschecker, Andreas M., Nada, Ayman, Aboian, Mariam, Flanders, Adam E., Wiestler, Benedikt, Bakas, Spyridon, Calabrese, Evan

arXiv.org Artificial Intelligence

We describe the design and results from the BraTS 2023 Intracranial Meningioma Segmentation Challenge. The BraTS Meningioma Challenge differed from prior BraTS Glioma challenges in that it focused on meningiomas, which are typically benign extra-axial tumors with diverse radiologic and anatomical presentation and a propensity for multiplicity. Nine participating teams each developed deep-learning automated segmentation models using image data from the largest multi-institutional systematically expert annotated multilabel multi-sequence meningioma MRI dataset to date, which included 1000 training set cases, 141 validation set cases, and 283 hidden test set cases. Each case included T2, T2/FLAIR, T1, and T1Gd brain MRI sequences with associated tumor compartment labels delineating enhancing tumor, non-enhancing tumor, and surrounding non-enhancing T2/FLAIR hyperintensity. Participant automated segmentation models were evaluated and ranked based on a scoring system evaluating lesion-wise metrics including dice similarity coefficient (DSC) and 95% Hausdorff Distance. The top ranked team had a lesion-wise median dice similarity coefficient (DSC) of 0.976, 0.976, and 0.964 for enhancing tumor, tumor core, and whole tumor, respectively and a corresponding average DSC of 0.899, 0.904, and 0.871, respectively. These results serve as state-of-the-art benchmarks for future pre-operative meningioma automated segmentation algorithms. Additionally, we found that 1286 of 1424 cases (90.3%) had at least 1 compartment voxel abutting the edge of the skull-stripped image edge, which requires further investigation into optimal pre-processing face anonymization steps.


Translation of Multifaceted Data without Re-Training of Machine Translation Systems

Moon, Hyeonseok, Lee, Seungyoon, Hong, Seongtae, Lee, Seungjun, Park, Chanjun, Lim, Heuiseok

arXiv.org Artificial Intelligence

Translating major language resources to build minor language resources becomes a widely-used approach. Particularly in translating complex data points composed of multiple components, it is common to translate each component separately. However, we argue that this practice often overlooks the interrelation between components within the same data point. To address this limitation, we propose a novel MT pipeline that considers the intra-data relation in implementing MT for training data. In our MT pipeline, all the components in a data point are concatenated to form a single translation sequence and subsequently reconstructed to the data components after translation. We introduce a Catalyst Statement (CS) to enhance the intra-data relation, and Indicator Token (IT) to assist the decomposition of a translated sequence into its respective data components. Through our approach, we have achieved a considerable improvement in translation quality itself, along with its effectiveness as training data. Compared with the conventional approach that translates each data component separately, our method yields better training data that enhances the performance of the trained model by 2.690 points for the web page ranking (WPR) task, and 0.845 for the question generation (QG) task in the XGLUE benchmark.


GraSAME: Injecting Token-Level Structural Information to Pretrained Language Models via Graph-guided Self-Attention Mechanism

Yuan, Shuzhou, Färber, Michael

arXiv.org Artificial Intelligence

Pretrained Language Models (PLMs) benefit from external knowledge stored in graph structures for various downstream tasks. However, bridging the modality gap between graph structures and text remains a significant challenge. Traditional methods like linearizing graphs for PLMs lose vital graph connectivity, whereas Graph Neural Networks (GNNs) require cumbersome processes for integration into PLMs. In this work, we propose a novel graph-guided self-attention mechanism, GraSAME. GraSAME seamlessly incorporates token-level structural information into PLMs without necessitating additional alignment or concatenation efforts. As an end-to-end, lightweight multimodal module, GraSAME follows a multi-task learning strategy and effectively bridges the gap between graph and textual modalities, facilitating dynamic interactions between GNNs and PLMs. Our experiments on the graph-to-text generation task demonstrate that GraSAME outperforms baseline models and achieves results comparable to state-of-the-art (SOTA) models on WebNLG datasets. Furthermore, compared to SOTA models, GraSAME eliminates the need for extra pre-training tasks to adjust graph inputs and reduces the number of trainable parameters by over 100 million.


Multi-Objective Latent Space Optimization of Generative Molecular Design Models

Abeer, A N M Nafiz, Urban, Nathan, Weil, M Ryan, Alexander, Francis J., Yoon, Byung-Jun

arXiv.org Artificial Intelligence

Molecular design based on generative models, such as variational autoencoders (VAEs), has become increasingly popular in recent years due to its efficiency for exploring high-dimensional molecular space to identify molecules with desired properties. While the efficacy of the initial model strongly depends on the training data, the sampling efficiency of the model for suggesting novel molecules with enhanced properties can be further enhanced via latent space optimization. In this paper, we propose a multi-objective latent space optimization (LSO) method that can significantly enhance the performance of generative molecular design (GMD). The proposed method adopts an iterative weighted retraining approach, where the respective weights of the molecules in the training data are determined by their Pareto efficiency. We demonstrate that our multi-objective GMD LSO method can significantly improve the performance of GMD for jointly optimizing multiple molecular properties.


The Brain Tumor Segmentation (BraTS) Challenge 2023: Local Synthesis of Healthy Brain Tissue via Inpainting

Kofler, Florian, Meissen, Felix, Steinbauer, Felix, Graf, Robert, Oswald, Eva, de da Rosa, Ezequiel, Li, Hongwei Bran, Baid, Ujjwal, Hoelzl, Florian, Turgut, Oezguen, Horvath, Izabela, Waldmannstetter, Diana, Bukas, Christina, Adewole, Maruf, Anwar, Syed Muhammad, Janas, Anastasia, Kazerooni, Anahita Fathi, LaBella, Dominic, Moawad, Ahmed W, Farahani, Keyvan, Eddy, James, Bergquist, Timothy, Chung, Verena, Shinohara, Russell Takeshi, Dako, Farouk, Wiggins, Walter, Reitman, Zachary, Wang, Chunhao, Liu, Xinyang, Jiang, Zhifan, Familiar, Ariana, Conte, Gian-Marco, Johanson, Elaine, Meier, Zeke, Davatzikos, Christos, Freymann, John, Kirby, Justin, Bilello, Michel, Fathallah-Shaykh, Hassan M, Wiest, Roland, Kirschke, Jan, Colen, Rivka R, Kotrotsou, Aikaterini, Lamontagne, Pamela, Marcus, Daniel, Milchenko, Mikhail, Nazeri, Arash, Weber, Marc-André, Mahajan, Abhishek, Mohan, Suyash, Mongan, John, Hess, Christopher, Cha, Soonmee, Villanueva-Meyer, Javier, Colak, Errol, Crivellaro, Priscila, Jakab, Andras, Albrecht, Jake, Anazodo, Udunna, Aboian, Mariam, Iglesias, Juan Eugenio, Van Leemput, Koen, Bakas, Spyridon, Rueckert, Daniel, Wiestler, Benedikt, Ezhov, Ivan, Piraud, Marie, Menze, Bjoern

arXiv.org Artificial Intelligence

A myriad of algorithms for the automatic analysis of brain MR images is available to support clinicians in their decision-making. For brain tumor patients, the image acquisition time series typically starts with a scan that is already pathological. This poses problems, as many algorithms are designed to analyze healthy brains and provide no guarantees for images featuring lesions. Examples include but are not limited to algorithms for brain anatomy parcellation, tissue segmentation, and brain extraction. To solve this dilemma, we introduce the BraTS 2023 inpainting challenge. Here, the participants' task is to explore inpainting techniques to synthesize healthy brain scans from lesioned ones. The following manuscript contains the task formulation, dataset, and submission procedure. Later it will be updated to summarize the findings of the challenge. The challenge is organized as part of the BraTS 2023 challenge hosted at the MICCAI 2023 conference in Vancouver, Canada.